This project aims to develop a machine learning model that predicts the loan eligibility of customers based on information provided in their application profile; a binary classification problem in which we predict whether a given loan would be approved or not. To this end, I will build, train, and evaluate several supervised learning models, using R data science libraries (tidymodels, ggplot, dplyr) to facilitate data analysis and modeling.
Data are pulled from Kaggle (originally sourced from an Analytics Vidhya Hackathon.
Loans are a necessity of the modern world, supporting consumption, economic growth, and business operations. Many types loans exist for different purposes across various stages of life, among which are home loans, which we intend to tackle in this problem.
Dream Housing Finance company deals in all home loans. They have a presence across all urban, semi-urban and rural areas. Customers can apply for a home loan after the company validates their eligibility. The company wants to automate the loan eligibility process (real-time) based on customer detail provided in their application. The company wants to identify customer segments that are eligible for loan amounts so that they can specifically target these customers.
The data files provided consists of a training set (train.csv) and test set (test.csv), which is identical to the training set except for the loan status to be predicted. The training set consists of 614 observations on 13 variables; the testing consists of 367 observations on 12.
Since this project only employs supervised learning, I will use only the training set. I will be performing a 70/30 split on the train.csv and utilize the response values to evaluate predictive accuracy.
First, I will import the raw dataset and examine its records to determine the necessary data cleaning and transformations to be performed. I will then conduct exploratory data analysis to visualize relationships and covariability, report my findings, and perform final tidying of the dataset before setting up the models. I then split the train.csv into a train and test set (70/30) and create validation sets to facilitate model selection and tuning. Six classification models of varying flexibility and complexity are fit to the training set and evaluated based on multiple performance metrics; the top 3 are chosen for testing. Finally, I will analyze and interpret the results of the best models, followed by a detailed conclusion of my findings.
We begin by loading and examining the data and performing some initial data tidying and manipulation. We then utilize visualization tools to explore the dataset and make some final adjustments before continuing with our analysis.
loan_ds <- read.csv("~/Desktop/School/PSTAT/PSTAT 131/proj-final/project_data/train.csv")
str(loan_ds)## 'data.frame': 614 obs. of 13 variables:
## $ Loan_ID : chr "LP001002" "LP001003" "LP001005" "LP001006" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Married : chr "No" "Yes" "Yes" "Yes" ...
## $ Dependents : chr "0" "1" "0" "0" ...
## $ Education : chr "Graduate" "Graduate" "Graduate" "Not Graduate" ...
## $ Self_Employed : chr "No" "No" "Yes" "No" ...
## $ ApplicantIncome : int 5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ...
## $ CoapplicantIncome: num 0 1508 0 2358 0 ...
## $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ...
## $ Loan_Amount_Term : int 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : int 1 1 1 1 1 1 1 0 1 1 ...
## $ Property_Area : chr "Urban" "Rural" "Urban" "Urban" ...
## $ Loan_Status : chr "Y" "N" "Y" "Y" ...
Observations:
Credit_History, a factor, is encoded as numericCredit_History, a categorical variable, is encoded as
numeric.ApplicantIncome and CoapplicantIncome are
monthly figures given in dollar amounts, while LoanAmount
is given in terms of thousands. We ideally want these to be on the same
scale.colSums(is.na(loan_ds)) ## Loan_ID Gender Married Dependents
## 0 0 0 0
## Education Self_Employed ApplicantIncome CoapplicantIncome
## 0 0 0 0
## LoanAmount Loan_Amount_Term Credit_History Property_Area
## 22 14 50 0
## Loan_Status
## 0
We observe missingness in the (numeric) variables
LoanAmount, Loan_Amount_term and
Credit_History. Note that is.na() does not
detect blank entries in character variables; thus, we must employ a
different methodology to identify empty strings.
sapply(loan_ds,function(x) table(as.character(x) =="")["TRUE"])## Loan_ID.NA Gender.TRUE Married.TRUE
## NA 13 3
## Dependents.TRUE Education.NA Self_Employed.TRUE
## 15 NA 32
## ApplicantIncome.NA CoapplicantIncome.NA LoanAmount.NA
## NA NA NA
## Loan_Amount_Term.NA Credit_History.NA Property_Area.NA
## NA NA NA
## Loan_Status.NA
## NA
So we have blank entries in Gender,
Married, Dependents, and
Self-employed. We will first convert these blanks into NA’s
for easy identification and determine how to handle them at a later
step.
Reload the dataset; reading blank entries as “NA.
loan_ds <- read.csv(file="~/Desktop/School/PSTAT/PSTAT 131/proj-final/project_data/train.csv",
header=TRUE,na.strings = c("",NA)) # read blanks as NA's Feature engineering: ApplicantIncome and
CoapplicantIncome
LoanAmount.loan_ds$ApplicantIncome <- (loan_ds$ApplicantIncome)/1000
loan_ds$CoapplicantIncome <- (loan_ds$CoapplicantIncome)/1000Redundancy
Loan_ID, a unique identifier, since it is not
relevant to our analysis.loan_ds <- loan_ds[,-1]; colnames(loan_ds)## [1] "Gender" "Married" "Dependents"
## [4] "Education" "Self_Employed" "ApplicantIncome"
## [7] "CoapplicantIncome" "LoanAmount" "Loan_Amount_Term"
## [10] "Credit_History" "Property_Area" "Loan_Status"
We convert categorical variables into factors.
# convert categorical variables into factor
loan_ds$Gender <- factor(loan_ds$Gender, levels = c("Male","Female"))
i <- factor(loan_ds$Married, levels = c("Yes","No"))
loan_ds$Education <- factor(loan_ds$Education, levels = c("Graduate","Not Graduate"))
loan_ds$Self_Employed <- factor(loan_ds$Self_Employed, levels = c("Yes","No"))
loan_ds$Property_Area <- factor(loan_ds$Property_Area, levels = c("Rural","Semiurban","Urban"))
loan_ds$Loan_Status <- factor(loan_ds$Loan_Status, levels = c("Y","N"), labels = c("Yes","No"))
loan_ds$Credit_History <- factor(loan_ds$Credit_History, levels = c(1,0), labels = c("Yes","No"))
loan_ds$Dependents <- recode(loan_ds$Dependents,"3+"="3") %>%
as.factor()
loan_ds$Married <- factor(loan_ds$Married)After transforming the data, we can now detect the correct amount of missingness in all variables.
colSums(is.na(loan_ds))## Gender Married Dependents Education
## 13 3 15 0
## Self_Employed ApplicantIncome CoapplicantIncome LoanAmount
## 32 0 0 22
## Loan_Amount_Term Credit_History Property_Area Loan_Status
## 14 50 0 0
vis_miss(loan_ds) # visualize missing dataThe table below provides a numerical summary of missingness in our dataset, showing the number of missing values in each variable, percent of missingness, and cumulative sum of missingness (a running total).
missingness <- loan_ds %>%
miss_var_summary(add_cumsum = TRUE) %>%
dplyr::arrange(n_miss_cumsum)
missingnesssum(missingness$pct_miss) # total % of missingness in dataset ## <pillar_num[1]>
## [1] 24.3
There are 149 total missing values (comprising ~24% of our dataset). One approach is to omit all missing values, or to remove variables with a lot of missingness. Another possible solution is to impute missing values where appropriate. Neither of these options are appropriate at this early of a stage in our analysis, since we do not yet know which variables are significant in prediction.
We will leave the dataset as is for now and continue with exploratory data analysis before returning to address missing values at a later step.
Loan_ID : Unique Loan ID.Gender : Male / Female.Married : Whether the applicant is married
(Yes/No)Dependents : A factor indicating the number of
dependents an applicant has (0,1,2,3).Education : An applicant’s education level
(Graduate/Under Graduate)Self_Employed : Whether the applicant is self-employed
(Yes/No)ApplicantIncome : An applicant’s monthly income (in
thousands of dollars).CoapplicantIncome : A coapplicant’s monthly income (in
thousands of dollars)LoanAmount : The loan amount requested by an applicant
(in thousands of dollars).Loan_Amount_Term : The term of the loan in months.Credit_History : Does the applicant’s credit history
meet the bank’s requirements (Yes/No)?Property_Area : An applicant’s area of residence
(Urban/Semi Urban/Rural).Loan_Status : Whether the loan was approved or not
(Yes/No).This section consists of data exploration and visualization; we will first examine the response, generate a correlation matrix, and analyze the independent variables one by one to discern potential relationships.
First, we will look at the distribution of the response by creating a barplot.
loan_ds %>%
ggplot(aes(x = Loan_Status)) +
geom_bar() +
theme_grey() # create barplot loan_ds %>%
select(Loan_Status) %>%
table() %>%
prop.table() ## Loan_Status
## Yes No
## 0.6872964 0.3127036
Approximately 69% of applicants were approved while 31% were
rejected. This imbalance in classes may hinder our models’ ability to
generate accurate predictions forLoan_Status. We will
likely need to upsample or downsample the data at a
later step.
Examining dependency among independent variables is a crucial step in our analysis, providing insight into relationships, interactions, and potential issues such as multicolinearity.
The corrplot() function generates a graphical display of
a correlation matrix, where the main diagonal are variances and the
other cells are covariances. The sliding scale on the right-side of the
plot illustrates the strength and direction of relationships for each
pair. Note that corrplot() only accepts numeric
variables.
The plot below illustrates the magnitude of correlation coefficients.
loan_ds %>%
select(where(is.numeric)) %>%
na.omit() %>%
cor() %>%
corrplot(method="number")We observe a moderate, positive correlation between
LoanAmount and ApplicantIncome (0.57), and
very little correlation (+/- 0.20) among the other numeric predictors (a
good sign!). We will keep these findings in mind as we explore the
dataset.
In the next few sections, we will analyze our predictors one-by-one to examine their distribution and relationship with each other and the response.
We observe a right-skewed distribution, with most values falling between 0-400 thousand; a good insight into the average amount requested by each applicant. We also detect a few high outliers pulling the mean up. There is no significant variation between the average loan amounts requested by approved and rejected applicants; however, we see greater variation among the rejected applicants.
require(gridExtra)
plot1 <- loan_ds %>%
na.omit(LoanAmount) %>%
ggplot(aes(x=LoanAmount)) +
geom_histogram(bins=40) +
theme_grey()
plot2 <- loan_ds %>%
na.omit(LoanAmount) %>%
ggplot(aes(Loan_Status, LoanAmount)) +
geom_boxplot(na.rm=T) +
geom_jitter(alpha = 0.1) +
theme_grey()
grid.arrange(plot1, plot2, ncol=2)anova(aov(LoanAmount ~ Loan_Status, loan_ds)) # insignificant difference Recall that LoanAmount is positively correlated with
Applicant_Income. A plot of LoanAmount by
ApplicantIncome shows an approximately linear trend,
indicating that applicants with higher income tend to request larger
loans.
When stratifying applicants based on
Loan_Status, we
observe similar approval rates across different loan amounts. However,
it is difficult to discern whether applicant incomes are predictive of
loan status. This brings us to our next section.
Applicant monthly incomes range from 0.15 to 81 thousand dollars, with the majority of values falling between 5 and 7 thousand.
summary(loan_ds$ApplicantIncome)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.150 2.877 3.812 5.403 5.795 81.000
We observe a right-skewed distribution; even with extreme outliers omitted, there’s a still an imbalance in the proportion of applicants falling within each income range. Average incomes among approved and rejected applicants are about the same.
plot1 <- loan_ds %>%
filter(ApplicantIncome < 50) %>% # omit high outliers
ggplot(aes(x=ApplicantIncome)) +
geom_histogram(fill="bisque",color="white",alpha=0.7, bins=10) +
geom_density() +
geom_rug() +
labs(x = "applicant income") +
theme_minimal()
plot2 <- loan_ds %>%
filter(ApplicantIncome < 50) %>%
ggplot(aes(y=ApplicantIncome,x=Loan_Status, color=Loan_Status))+
geom_boxplot() +
theme_grey()
grid.arrange(plot1, plot2, ncol=2)Upon inspecting the three high outliers (ApplicantIncome
> 50), we find that:
2 out of 3 applicants have 3 or more dependents, both of whom are self-employed;
2 out of 3 applicants were approved for a loan; both reside in an urban area and have good credit history.
2 out of 3 applicants are Males;
All 3 applicants have a graduate degree and have no coapplicant.
loan_ds %>%
filter(ApplicantIncome > 50)These observations indicate that neither ApplicantIncome
or LoanAmount are not very predictive of
Loan_Status in the most extreme cases, seeing that, the
most affluent applicant requesting the smallest amount was rejected. In
theory, we’d expect the opposite outcome.
We also observe that Property_Area and
Credit_History are relevant factors even in applicants with
very high incomes, seeing as the only applicant who was rejected resides
in a rural area and has bad credit history.
It’s difficult to discern whether Education or
Dependents affect loan status; we will examine these
factors in the next sections.
For now, let’s take a closer look at approval rates. Applicants with monthly incomes of 15-20 thousand have the highest approval rate of ~77%. So, higher income do somewhat translate to a higher approval rate. However, we must note that there are varying amounts of data points in each bin, possibly inflating (or understating) approval rates.
A density plot grouped by Loan_Status indicates that
coapplicant incomes are right skewed, with a mix of high and low values
in each status category. Average coapplicant incomes for approved
applicants are slightly higher than that of rejected applicants.
loan_ds %>%
ggplot(aes(x=Loan_Status, y=CoapplicantIncome, color=Loan_Status)) +
geom_boxplot()It is important to note that many coapplicant incomes are 0 (no coapplicant) across both categories, skewing the mean down significantly. Omitting those values provide us with better insight into its central tendency. The below plot suggests a different conclusion that what we assumed previously. Of those with a coapplicant, applicants are approximately equally likely of being approved for a loan.
loan_ds %>%
filter(CoapplicantIncome != 0) %>%
ggplot(aes(x=Loan_Status, y=CoapplicantIncome, color=Loan_Status)) +
geom_boxplot()Next, we examine whether having a coapplicant itself affects loan status. The table below shows that 273 out of 614 (about 44%) applicants do not have a coapplicant; a relatively large count.
loan_ds %>%
dplyr :: count(CoapplicantIncome == 0) The variable has_coapp has the value of FALSE if
CoapplicantIncome is 0, and TRUE otherwise. A contingency
table of Loan_Status and has_coapp indicates
that applicants with coapplicants are more likely to be approved (71% vs
65%).
loan_ds %>%
dplyr:: mutate(has_coapp = if_else(CoapplicantIncome != 0,TRUE,FALSE)) %>%
group_by(has_coapp, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) # on average, ~72% of applicants w/ a coapplicant were approved for a loan
# while only ~65% of applicants w/o a coapplicant were approved.Perhaps the presence of a coapplicant is more predictive of loan
status than the numerical value of their income. We should consider
transforming CoapplicantIncome into a factor.
Loan_Amount_Term gives the term of the loan in months.
From the plot below, we see that it has a left skew. This means that its
mean (342 months ie., 28.5 years) is lower than its median and mode.
Because there are so few points on the lower end, the mode is more
representative of its center.
loan_ds %>%
na.omit(Loan_Amount_Term) %>%
ggplot(aes(x=Loan_Amount_Term)) +
geom_bar() +
theme_grey() # mfv 360 We now look at Loan_Amount_Term in relation to
Loan_Status Applicants requesting short-term loans are more
likely to be approved, on average, but we do see a high approval rate
for 360. We should also keep in mind that ~85% of loans have a term of
360; the data may be under-representing applicants with alternative loan
terms.
loan_ds %>%
na.omit(Loan_Amount_Term) %>%
ggplot(aes(x=Loan_Amount_Term, fill=Loan_Status)) +
geom_bar(position="fill")Dependents is right skewed; most applicants have no dependents.
Approval rates are similar across each category, with the highest
approval rate for 2 dependents. There is no clear pattern; this
indicates that Dependents may not be very influential in
determining their Loan_Status.
plot1<-loan_ds %>%
na.omit(Dependents) %>% # should be able to impute this later
ggplot(aes(x=Dependents)) +
geom_bar() # most applicants have no dependents
# dependents vs. loan status
plot2<-loan_ds %>%
na.omit(Dependents) %>%
ggplot(aes(x=Dependents, fill=Loan_Status)) +
geom_bar(position="fill")
# relatively similar likelihood of approval for each # of dependents
grid.arrange(plot1,plot2,ncol=2)A natural question is whether applicants with dependents (indicative
of a larger household) request a larger loan. The boxplots below indeed
suggest that individuals with dependents do, on average, request a
larger loan amount; however, the actual number of dependents do not seem
very significant.
We observe there are more male applicants than female applicants (81% vs 19%); thus females may be under represented. A natural question to ask is whether there is bias in the selection process. From the plots below, we can see that females are indeed less likely (about 8%) to be approved for a loan than males.
prop.table(table(loan_ds$Gender)) # more males than females ##
## Male Female
## 0.8136439 0.1863561
loan_ds %>%
na.omit(Gender) %>% # should be able to impute this later
ggplot(aes(x=Gender, fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Gender) %>%
group_by(Gender, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.
Married applicants comprise a majority of our dataset (~60%), which is surprising since most applicants have no dependents. Nonetheless, a 60-40 ratio offers a good contrast. Those who are married are 10% more likely to be approved for a loan - a pretty significant difference given the size of our dataset.
prop.table(table(loan_ds$Married))##
## No Yes
## 0.3486088 0.6513912
loan_ds %>%
na.omit(Married) %>%
ggplot(aes(x=Married,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Married) %>%
group_by(Married, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Married'. You can override using the
## `.groups` argument.
Education is a factor representing an applicant’s
educational level with levels “Graduate” or “Not Graduate”. Our dataset
is comprised of ~80% graduates, which makes sense due to education
loans. An 80-20 ratio, however, is definitely unbalanced. From the
barplots, we can see that graduates are more likely to get approved (8%
more).
prop.table(table(loan_ds$Education))##
## Graduate Not Graduate
## 0.781759 0.218241
loan_ds %>%
na.omit(Education) %>%
ggplot(aes(x=Education,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Education) %>%
group_by(Education, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Education'. You can override using the
## `.groups` argument.
Self_Employed is a factor that indicates if an applicant
is self-employed. Only 14% of applicants in the dataset are
self-employed. There is no significant difference in the approval rates
between self-employed and non self-employed individuals; a slight
difference of 4%, with non self-employed individuals having the higher
rate.
prop.table(table(loan_ds$Self_Employed))##
## Yes No
## 0.1408935 0.8591065
loan_ds %>%
na.omit(Self_Employed) %>%
ggplot(aes(x=Self_Employed,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Self_Employed) %>%
group_by(Self_Employed, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Self_Employed'. You can override using the
## `.groups` argument.
Credit_History is a factor indicating whether an
applicant’s credit satisfies the bank’s requirements. Most applicants do
have good credit history (~85%). Credit history appears to be a very
important factor, given that nearly 80% of applicants with good credit
history get approved, whereas only 10% of applicants with bad credit
history do. It is important to note, however, that there may be some
selection bias since individuals with good credit history may be more
inclined to apply in the first place.
prop.table(table(loan_ds$Credit_History))##
## Yes No
## 0.8421986 0.1578014
loan_ds %>%
na.omit(Credit_History) %>%
ggplot(aes(x=Credit_History,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Credit_History) %>%
group_by(Credit_History, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Credit_History'. You can override using
## the `.groups` argument.
Property_Area is a factor representing the area in which
an applicant resides: Urban, Semi Urban, or Rural. We have a good mix of
applicants from all 3 areas; those residing in semi urban areas have the
highest approval rate. We also observe that married individuals tend to
prefer semi urban areas over others. This is a pretty insightful finding
because, as we recall, married individuals had a 10% higher approval
rate than non-married individuals. Coupled with a higher approval rate
for those with coapplicants, we may just find our target
demographic!
prop.table(table(loan_ds$Property_Area))##
## Rural Semiurban Urban
## 0.2915309 0.3794788 0.3289902
loan_ds %>%
na.omit(Property_Area) %>%
ggplot(aes(x=Property_Area,fill=Loan_Status)) +
geom_bar(position="fill")# 2-way contingency table
loan_ds %>%
na.omit(Property_Area) %>%
group_by(Property_Area, Loan_Status) %>%
dplyr:: summarise(n=n()) %>%
dplyr::mutate(freq = prop.table(n)) ## `summarise()` has grouped output by 'Property_Area'. You can override using the
## `.groups` argument.
# is property area related to any other predictors?
loan_ds %>%
na.omit(Property_Area, Loan_Status, Married) %>%
dplyr:: mutate(has_coapp = if_else(CoapplicantIncome != 0,TRUE,FALSE)) %>%
ggplot(aes(x=Property_Area,fill=Married)) +
geom_bar(position="dodge") +
facet_wrap(~has_coapp)Now that we’ve explored the dataset, we will need to fix some errors before continuing with our analysis. Let’s review these issues:
Missingness observed in 7 variables;
Extreme high outliers observed in ApplicantIncome
and LoanAmount.
Missingness exists in both numerical and categorical variables.
Therefore, we will be using the mice package, which imputes
missing values with plausible data values inferred from other variables
in the dataset.
# install and load
# install.packages("mice")
library(mice)From the missing data table below, we see that the first two variables are missing a large proportion of its values, while the latter five are missing none.
loan_ds %>%
miss_var_summary()Now, we call the mice package. The argument
m indicates the number of multiple imputations; the
standard is m = 5. The method argument specifies the
imputation method applied to all variables in the dataset; a separate
method can also be specified for each variable.
We can control the defaultMethod used for different
types of data. I will choose predictive mean matching for numeric data,
logistic regression for 2-level factors, linear discriminant analysis
for unordered factor data, and proportional odds for ordered factor
data.
imp <- mice(loan_ds, m=5, defautMethod = c("pmm","logreg", "lda", "polr"))Here, we can see the actual imputations for
Dependents:
imp$imp$DependentsNow let’s merge the imputed data into our original dataset via the
complete() function.
loan_ds <- complete(imp,5) # I chose the 5th round of data imputationCheck missing data again, we note that there is no missing data after the imputation:
loan_ds %>%
miss_var_summary()Outliers can be tricky. It’s hard to determine if they are data entry errors, sampling errors, or natural variation in our data. If we decide to remove records, however, it may result in information loss. We will assume that the missing values are systematic until proven otherwise.
Looking at LoanAmount, we see that the “extreme” values
are somewhat plausible. Some customers may want to apply for a loan as
high as 650 thousand.
zscore <- (abs(loan_ds$LoanAmount-mean(loan_ds$LoanAmount, na.rm=T))/sd(loan_ds$LoanAmount, na.rm=T))
loan_ds$LoanAmount[which(zscore > 3)]## [1] 650 600 700 495 436 480 480 490 570 500 480 480 600 496
Since we have a positive skew, we will perform a log transformation to normalize the data. Now the data looks closer to normal and the effect of extreme outliers are significantly smaller.
loan_ds$LogLoanAmount <- log(loan_ds$LoanAmount)plot1 <- loan_ds %>%
ggplot(aes(x=LoanAmount)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Loan Amount") +
xlab("Loan Amount")
plot2 <- loan_ds %>%
ggplot(aes(x=LogLoanAmount)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Log Loan Amount") +
xlab("Log Loan Amount")
grid.arrange(plot1,plot2,ncol=2)We also have a pretty severe positive skew for
ApplicantIncome, so we will perform a log transformation as
well. The data looks much better.
loan_ds$LogApplicantIncome <- log(loan_ds$ApplicantIncome)plot1 <- loan_ds %>%
ggplot(aes(x=ApplicantIncome)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Applicant Income") +
xlab("Applicant Income")
plot2 <- loan_ds %>%
ggplot(aes(x=LogApplicantIncome)) +
geom_histogram(bins=20) +
geom_density()+
labs(title="Histogram for Log Applicant Income") +
xlab("Log Applicant Income")
grid.arrange(plot1,plot2,ncol=2)Now, we will remove the original variables from our dataset
loan_ds <- select(loan_ds,-LoanAmount) # remove original variable
loan_ds <- select(loan_ds,-ApplicantIncome) # remove original variable Now that we have a better idea of how the variables in our dataset impact loan status, it’s time to set up our models. We will perform our train/test split, create our recipe, then establish 10-fold cross-validation to help with our models.
Before we do any modeling, we will need to randomly split our dataset into a train and test set. The reason why we split our data is to avoid overfitting; we will fit the models on the training data, then use those models to make predictions on the previously unseen testing data. The testing set is reserved to be fit only once after the models have “learned” from the train set. From there, we will use error metrics to evaluate each model’s performance. We will use a 70/30 split since our dataset is relatively small and we want to reserve enough data for the test set. We will set a random seed before our split so that we can replicate our results, and stratify on our response.
set.seed(3450)
loan_split <- initial_split(loan_ds, prop = 0.70,
strata = "Loan_Status")
loan_train <- training(loan_split)
loan_test <- testing(loan_split)
loan_folds <- vfold_cv(loan_train, v = 10, strata = "Loan_Status")Dimensions of our datasets:
dim(loan_train); dim(loan_test)## [1] 429 12
## [1] 185 12
Now that we’ve completed all the preliminary steps, it’s time to build our recipe. Think of it as following a recipe for cut-out cookies. Because we’ll be using a variety of different molds (models), each cookie will look different, but their ingredients will be the same! Inside, they’re all the same flour and sugar and eggs! That’s what this recipe is; a unique mix of ingredients that will be fitted to different molds. Our goal turns into finding the best mold for our particular mix. From there, fitting the best model to our test data is analogous to using a different brand of the essential ingredients (ie., the test data), shaping the dough with our best cookie mold, then putting it into the oven!
In our recipe, we’ll be using 8 out of the 11 original predictors, 2
transformed variables LogLoanAmount and
LogApplicantIncome, plus a new variable
Coapplicant.
We’ll first need to upsample the data. Recall from earlier that our
response was severely imbalanced; if we train our models on an
imbalanced dataset, they can accidentally become better at identifying
one level versus another, which is undesirable. Two solutions come to
mind: upsampling or downsampling. Since we have a small dataset,
step_upsample() is the better option. We’ll use
over_ratio=1 so that are equally as many Yes’s as there are
No’s. Because upsampling is intended to be performed on the training set
alone, the default skip option is skip=TRUE. We’ll use
skip=FALSE to make sure that it’s brought the counts to be
equal and then rewrite the recipe without.
Since the values of CoapplicantIncome do not appear to
affect our response, we’ll transform it into a categorical variable
Coappliant to indicate the presence or absence of a
coapplicant. We’ll then scale and center our numeric predictors, and
dummy-code the nominal predictors.
loan_recipe <- recipe(Loan_Status~., data=loan_train) %>%
step_upsample(Loan_Status, over_ratio = 1, skip = FALSE) %>%
step_mutate(Coapplicant = factor(if_else(CoapplicantIncome!=0, "Yes","No",NA))) %>%
step_rm(CoapplicantIncome) %>%
# transform coapplicant income into a factor
# Yes if CoapplicantIncome is not 0, No otherwise.
step_scale(all_numeric_predictors()) %>%
step_center(all_numeric_predictors()) %>% # scale and center
step_dummy(all_nominal_predictors()) # convert into factor prep(loan_recipe) %>% bake(new_data = loan_train) %>%
group_by(Loan_Status) %>%
dplyr :: summarise(count = n())Now we rewrite the recipe with skip=TRUE:
loan_recipe <- recipe(Loan_Status~., data=loan_train) %>%
step_upsample(Loan_Status, over_ratio = 1, skip = TRUE) %>%
step_mutate(Coapplicant = factor(if_else(CoapplicantIncome!=0, "Yes","No",NA))) %>%
step_rm(CoapplicantIncome) %>%
# transform coapplicant income into a factor
# Yes if CoapplicantIncome is not 0, No otherwise.
step_scale(all_numeric_predictors()) %>%
step_center(all_numeric_predictors()) %>% # scale and center
step_dummy(all_nominal_predictors()) # convert into factor We can use prep() to check the recipe to verify it
worked.
prep(loan_recipe) %>%
bake(new_data = loan_train) %>%
kable() %>%
kable_styling(full_width = F) %>%
scroll_box(width = "100%", height = "200px")| Loan_Amount_Term | LogLoanAmount | LogApplicantIncome | Loan_Status | Gender_Female | Married_Yes | Dependents_X1 | Dependents_X2 | Dependents_X3 | Education_Not.Graduate | Self_Employed_No | Credit_History_No | Property_Area_Semiurban | Property_Area_Urban | Coapplicant_Yes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.2450727 | -0.0508723 | 0.1280076 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.3518664 | -0.4959078 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | -0.2724192 | -1.2439388 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.0479359 | -0.2761123 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | -0.4480157 | 0.9062231 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | -0.2391547 | -0.7307845 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.2651939 | -0.1892989 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | -2.5309840 | -1.6238742 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.8122625 | -0.0165239 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.0989432 | -0.4319064 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.7016799 | 1.6481662 | No | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.5230316 | -1.2645184 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -0.9498293 | -0.2377547 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.1743125 | -0.7047869 | No | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -0.3062724 | 0.0065739 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.0367457 | 0.1388776 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.4868325 | 1.5218393 | No | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0.2450727 | 0.6536091 | 0.2431861 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | -0.2391547 | -0.0165239 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 1.2897669 | 0.4653528 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.0650819 | 0.0314404 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 0.5582213 | -0.1759075 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.3407357 | -2.1784279 | No | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.7245504 | 0.7217816 | No | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.1743125 | 0.8401258 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.9499233 | -0.1558404 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | 0.2778187 | -0.2495844 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 0.5473229 | 0.4229599 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 1.5133976 | -0.0781287 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.8565103 | 0.1280076 | No | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.1673149 | 1.9242907 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | -0.0359877 | -0.0042094 | No | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.2064588 | 0.6045475 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | 1.1830570 | 1.1781572 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.4118166 | 0.9592790 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 2.1678503 | -1.4067482 | -0.9586106 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| -2.6390938 | -0.9260692 | -0.2709415 | No | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2.1678503 | -0.4851559 | -0.8949495 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -1.2890089 | -1.1932851 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -0.7163161 | -1.3177712 | -0.9991058 | No | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0.2450727 | 0.4231539 | 1.5863190 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.3759253 | 0.5444794 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.8794167 | 1.4544928 | No | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| -2.6390938 | -0.5230316 | -0.0811618 | No | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.0658736 | -0.3163526 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.0508723 | -0.3110429 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.0509663 | -5.0526513 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.1513481 | 0.2599440 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | -1.5000671 | -0.9694858 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | 0.0930045 | 0.1774403 | No | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.2673963 | 0.4706354 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| -2.6390938 | 0.1205252 | -0.7841576 | No | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.8122625 | -1.4032922 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.0930045 | -0.3123686 | No | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.7246444 | 2.0456601 | No | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
| 0.2450727 | -0.2557146 | 0.5474899 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | -0.6413782 | 0.3338633 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.6035733 | 1.8920306 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0508723 | 0.2388907 | No | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 2.4771967 | 2.4099457 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -0.7163161 | -0.5422545 | -0.7307845 | No | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0658736 | 0.0662706 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 2.1678503 | 0.1610778 | 0.3813055 | No | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | -0.2892709 | -1.1177099 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.6618346 | -1.2879043 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | 2.4771967 | 2.3777914 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.3639337 | -0.8066571 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 2.0101310 | 1.0931727 | No | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| 0.2450727 | -0.3407357 | 0.0430231 | No | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.7373376 | -1.2628617 | No | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | 0.5253381 | -0.7036417 | No | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -2.0082669 | -0.8660048 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.6285758 | 1.1077874 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.4297126 | 0.4043432 | No | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0.2450727 | -1.2890089 | -0.1262294 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 2.1678503 | -0.5230316 | -0.5139802 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | 0.0224186 | 0.5897523 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 2.1678503 | -0.6011102 | -0.7902057 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 2.8058877 | 2.3396507 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -2.6390938 | -0.3234264 | -0.4829884 | No | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -0.5230316 | 0.1223773 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -1.1780990 | -0.4257001 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | -0.4664957 | -1.5641298 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| -2.6390938 | -2.0503050 | -0.5711002 | No | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| -0.7163161 | -1.3469727 | -0.6430543 | No | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.7444649 | 0.8622766 | No | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| -0.7163161 | 0.2778187 | 1.0277957 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 0.9675516 | 0.5602183 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | 0.5798335 | -1.0194140 | No | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.3759253 | -0.7307845 | No | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 2.1678503 | -0.3407357 | 0.8228797 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -0.4297126 | 0.6574378 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | -1.6664903 | -0.2636463 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0962340 | -0.9106157 | No | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 0.2396914 | -0.0979534 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 0.5253381 | -0.1929718 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -2.6390938 | -0.2892709 | -0.7002114 | No | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 0.0079834 | -0.5795142 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.3759253 | -0.5409852 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 2.5552753 | 2.2283894 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -1.0732695 | -0.8301180 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.8869123 | 0.6059950 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 0.0930045 | -0.0593080 | No | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.6011102 | -0.2449260 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -2.6390938 | -1.2052294 | -1.0328686 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.6471154 | 0.4887202 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.1270840 | -0.4770625 | No | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | 0.6536091 | 0.1607100 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.6211382 | -0.3243523 | No | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -1.2052294 | 0.1597391 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.3407357 | -0.2428133 | No | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | 0.7640821 | 1.4885402 | No | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.3177712 | -0.5564872 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.5422545 | -1.0271890 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 1.3553502 | 0.6881251 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 0.2450727 | 0.6118007 | -0.8183421 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |
| -2.6390938 | 0.0079834 | -0.2407035 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | 1.2897669 | 1.1162480 | No | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| 0.2450727 | -5.1286119 | -0.8660048 | No | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.4480157 | -0.4323849 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | 0.4347810 | 0.2946915 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| 2.1678503 | 0.6741755 | 0.5674086 | No | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.2524851 | 0.6353679 | No | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.0650819 | -0.4706689 | No | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | -0.7245504 | -1.2579024 | No | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.4684522 | -1.2220224 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | 0.2007885 | -0.6255580 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| -4.1773159 | 0.5142501 | 0.0254388 | No | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.4115830 | -0.6590866 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| -1.6777050 | 0.8499516 | 0.5361694 | No | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.6741755 | 0.6299185 | No | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| -2.6390938 | 1.8730775 | -3.5072290 | No | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| -2.6390938 | -0.2557146 | -0.8520528 | No | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.0224186 | 0.1280076 | No | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| 0.2450727 | -0.1743125 | 0.4975528 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -1.3177712 | -0.5139802 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.1743125 | -0.7407231 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.1341385 | 0.5361694 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.6211382 | -0.8949495 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.4692443 | -0.0758578 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.2052294 | -0.4162014 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.8027230 | -0.4775555 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.0962340 | 0.2447172 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -1.6777050 | -0.5230316 | -0.2394390 | Yes | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| -2.6390938 | -0.2557146 | -0.7307845 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.6715587 | 0.5247638 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.7146566 | 1.2419205 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.1426976 | -0.6190487 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.1743125 | -0.1759075 | Yes | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.1744065 | -0.2098172 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.6432424 | -0.0165239 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.9671328 | -1.2879043 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.0732695 | -0.8520528 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -0.7163161 | -0.6011102 | 0.1645873 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.7675333 | -0.3199028 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -2.0932879 | 0.4448411 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.1744065 | 0.4902826 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.1744065 | -0.0781287 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.5230316 | -1.1838259 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.1743125 | -0.4711597 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.6011102 | -1.0314467 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.6012042 | 0.9870961 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.1744065 | -0.3436354 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.1743125 | -0.7902057 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -0.7163161 | -0.9260692 | -0.2804349 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.3234264 | -0.1376105 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -1.6777050 | -1.8487862 | -0.0826807 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.0650819 | -0.8736695 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.5422545 | -0.3545057 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.4480157 | -0.7902057 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.5473229 | 0.4571282 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0065616 | -0.5353407 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.1426976 | -0.0285728 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -3.6004826 | -3.1745408 | -0.2293611 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.8487862 | -1.1458023 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -2.6390938 | -0.2557146 | -0.0085443 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.0065616 | -1.3228128 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -2.6390938 | 0.0224186 | 0.0272064 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.2651939 | 0.2116086 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3759253 | -0.1494746 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.5230316 | -0.4879444 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.0280018 | 1.5108650 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.1678503 | -0.6413782 | -1.0754269 | Yes | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.0650819 | 0.3527908 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.1068144 | -0.5502674 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.4118166 | 0.8643113 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.2064588 | 0.2750192 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -2.6390938 | 0.6536091 | 1.8816769 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.3028212 | -0.4319064 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.8338750 | -0.9742684 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.2971660 | 1.3706797 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.6012042 | 0.4496609 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -2.0932879 | -0.0165239 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.0790942 | -1.0278978 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | -0.9260692 | -0.5358530 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.7444649 | 0.0314404 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 1.9793637 | 2.6239813 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -1.5649093 | -0.1301443 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.0989432 | -1.1177099 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.2052294 | -0.6635394 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -3.6004826 | -3.1745408 | -0.2982873 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.4851559 | 0.2277892 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.4692443 | 0.3813055 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -2.6390938 | 0.6741755 | 1.5553245 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.5230316 | -0.8968989 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -1.2052294 | 1.9107965 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -2.8258218 | -1.5262267 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.2778187 | -0.7442464 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| -2.6390938 | 2.9039943 | 3.4103549 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 0.3759253 | 0.5361694 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.1743125 | -0.5241143 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.5616725 | -1.2357846 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3518664 | -0.0161603 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -2.6390938 | -1.0732695 | 1.2038879 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.8027230 | 0.9720209 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.6211382 | -1.2711633 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.2391547 | -0.2817340 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| -2.6390938 | -1.1249661 | 0.1336170 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.3758313 | -0.2272699 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.9268648 | 0.2246909 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.2892709 | -0.3172394 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.7245504 | -0.7307845 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.4463379 | 0.2364307 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| 0.2450727 | 0.0509663 | -0.2804349 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.1115968 | -0.1098966 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.1743125 | 0.0690227 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.9498293 | -0.3955114 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -1.6664903 | -0.0876275 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.5322134 | -0.8029858 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.1514421 | 0.9311361 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.0212181 | 0.4592549 | Yes | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.0790942 | -0.4290388 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.0065616 | -0.1852284 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.1744065 | -0.5286969 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.5473229 | 0.8899172 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0508723 | 0.5980166 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.8960420 | 0.5980166 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.8027230 | 0.6339482 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.4480157 | -0.7956697 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -4.1773159 | -0.1743125 | -0.4214183 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 1.4462798 | 1.0338075 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.1205252 | -0.4376579 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.4918796 | 0.4043432 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.1426976 | -0.2965363 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.6011102 | -0.3627099 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -4.5618714 | 0.3152009 | -0.5317596 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.3758313 | 0.5980166 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.1270840 | -0.3216810 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.3062724 | 1.2644150 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -2.6390938 | 0.0790942 | 0.6824040 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| 2.1678503 | -0.1270840 | -0.5747756 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.7245504 | -1.2803480 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -2.6390938 | 1.4394367 | -0.6956497 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.2673963 | -0.4726333 | Yes | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.7605357 | 0.5569839 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.0367457 | -1.0834954 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.1743125 | 0.8123949 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.0508723 | 0.3094265 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.2524851 | 0.1687765 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3759253 | 0.0430231 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 |
| 0.2450727 | -0.7245504 | -0.5779331 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -2.8258218 | -0.3987842 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.0809936 | -0.6007560 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.2524851 | -0.7191749 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.2557146 | -0.6458064 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.8685212 | -0.0807824 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.9498293 | -0.4628366 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 2.2933062 | 1.8816769 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.3062724 | -0.1060337 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -2.6390938 | -0.9982536 | -0.9158740 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.7459211 | -0.4362180 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.1743125 | -0.1946070 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.2557146 | -0.0318756 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.6741755 | 1.2336572 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.1068144 | -0.4765697 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0367457 | -0.4386186 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.0509663 | 0.1552002 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.4297126 | -0.5784599 | Yes | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.0079834 | 0.2599440 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.1205252 | -0.0781287 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0509663 | -0.1710671 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.6536091 | 0.9076070 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.8565103 | -0.5564872 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.6320270 | -0.7859695 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -4.1773159 | 0.1744065 | 0.4706354 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.1743125 | -0.1892989 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.3062724 | -0.2627903 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.0212181 | -0.2373339 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0079834 | -0.4323849 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0212181 | 0.2345831 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| 0.2450727 | -0.2227369 | 0.0607513 | Yes | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.5616725 | -0.5471670 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -2.6390938 | -1.2052294 | 0.4051694 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0962340 | -0.7407231 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.3759253 | -0.7908118 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0809936 | -0.4726333 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.2971660 | 0.6180040 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.0533355 | 1.1643756 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.6211382 | -0.6928057 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0212181 | -0.8376028 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -2.6390938 | -0.2391547 | -0.1506662 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.4347810 | 1.0338075 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.8027230 | 0.3097198 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.1192958 | 0.2599440 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.0212181 | 0.1822323 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.1341385 | 0.7306910 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.5473229 | 0.6160889 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.6664903 | 1.9991765 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| -0.7163161 | 2.0303710 | 1.2038879 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.3407357 | -0.6381130 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.0212181 | 0.0349597 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.0212181 | 0.3740163 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.0508723 | -0.3545057 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3275010 | 0.5361694 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -2.6390938 | -0.2892709 | -0.2098172 | Yes | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.0650819 | -0.8949495 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.6536091 | 1.1783224 | Yes | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.4118166 | 1.2330197 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.1743125 | -0.7407231 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| -2.6390938 | -0.2892709 | -1.1335903 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.2892709 | -0.4643021 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.0509663 | -0.3806397 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.6211382 | -0.4974057 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.4664957 | -0.7902057 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -1.7373376 | -1.1565692 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.2557146 | -1.0067769 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.3759253 | 0.5994704 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.3407357 | -0.9478129 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.5000671 | -0.5165074 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.3062724 | -2.1410175 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0930045 | -0.3945776 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.0930045 | 0.4923634 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.5230316 | -1.2711633 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.6011102 | 1.9122226 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.1584398 | -0.6928057 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0224186 | 0.5196669 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.7893925 | -0.5549298 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.2524851 | 1.0858121 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 2.1522377 | 1.6930668 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.1610778 | -0.8363527 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.5230316 | -0.1759075 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| -1.6777050 | -0.1743125 | 1.3171976 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -1.8487862 | -1.1853984 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.1426976 | -0.9749528 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.0930045 | 0.8001410 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.6012042 | 0.9531552 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.2268114 | 0.1684547 | Yes | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.2778187 | 3.3214361 | Yes | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -0.7163161 | 0.5473229 | 0.8401258 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.3407357 | 0.1506475 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| -2.6390938 | -0.7245504 | -0.8005432 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
| 0.2450727 | -0.8565103 | 0.7880009 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.0650819 | 0.0503481 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.3759253 | -1.0666736 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.2903608 | -0.5064238 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0079834 | 0.3167407 | Yes | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.1205252 | 0.5361694 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.3996854 | -0.3022345 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.5230316 | 0.1632960 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.6618346 | -0.7902057 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.3996854 | 0.4131325 | Yes | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.2524851 | 2.0710848 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.8115045 | 2.0840480 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3028212 | 0.0489556 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -0.2892709 | -0.4667477 | Yes | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.1030177 | 1.4313153 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 1.1986706 | 1.0338075 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.3759253 | 0.1280076 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.6011102 | -0.6741674 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.6639200 | 0.3576281 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.3407357 | -0.1999337 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 1.0280018 | 2.1372788 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| -4.1773159 | -0.4297126 | -0.3576910 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -5.3309824 | -0.3234264 | 0.1418287 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 2.1678503 | -0.2892709 | -2.8427830 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.8777389 | 0.2184753 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| -0.7163161 | -0.1115968 | -0.1502689 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.1752021 | -0.2761123 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 2.4771967 | 2.4220661 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 0.6843764 | -0.8029858 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.2295207 | 0.5524942 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.2524851 | -0.2952244 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| 2.1678503 | -0.2892709 | -1.1049752 | Yes | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -1.6664903 | -0.7902057 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.5230316 | 0.4523320 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2450727 | 2.4771967 | 0.9621410 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.3878423 | 0.3439301 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 |
| 0.2450727 | -0.0212181 | 0.7220049 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.5230316 | 0.0247312 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.0962340 | 2.2144418 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| -2.6390938 | 0.1205252 | 0.1362496 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.6211382 | -0.2160272 | Yes | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.0508723 | -0.9572567 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -0.4851559 | -0.5054191 | Yes | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3152009 | -0.6359220 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.1876430 | -0.7745305 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.3407357 | -0.6266457 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0212181 | -0.7950616 | Yes | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.3518664 | -0.3545057 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -3.0995250 | 0.3955027 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.8565103 | -0.6922375 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | 1.3045365 | 1.3310077 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.3996854 | -0.1215449 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.6223388 | -0.0114411 | Yes | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | -0.3758313 | -0.2065155 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 2.9039943 | 2.3206419 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.9051283 | 0.5166005 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 0.0079834 | 0.4848070 | Yes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.1743125 | 0.0891667 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | -1.2052294 | -0.3545057 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 1.4118166 | 2.0839571 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | -0.3407357 | -0.4025331 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| 0.2450727 | -1.2052294 | -1.0081758 | Yes | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | -0.6413782 | 0.1822323 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| -2.6390938 | -1.6320270 | -0.5139802 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 1.5265431 | 1.2094031 | Yes | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| -2.6390938 | 0.1476556 | -0.1324981 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 1.3045365 | 2.0334910 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.7246444 | 0.4795737 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.3152009 | -0.5684804 | Yes | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.0508723 | 0.4592549 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
| 0.2450727 | 0.5142501 | -0.2061033 | Yes | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | 2.5399124 | 1.5863190 | Yes | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 0.2450727 | 0.5253381 | -0.3243523 | Yes | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 0.2450727 | 0.3397225 | -0.0830606 | Yes | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -0.3758313 | -0.4011262 | Yes | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| 0.2450727 | -1.1780990 | -0.5653426 | Yes | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| -2.6390938 | -2.2755839 | -0.0385028 | Yes | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| 0.2450727 | 0.6741755 | 0.8909165 | Yes | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
Notice that, by dummy-coding the nominal predictors, we’ve increased
the number of columns in our dataset. This is because each factor has
been transformed into k-1 dummy variables, with one level held out as
the reference (or baseline) level. The baseline level is not visible in
our dataset and assigned a value of 0. For a given predictor, if the
dummy variables corresponding to every other level is 0, then we default
to the baseline. For instance both Property_Area_Urban and
Property_Area_Semiurban are 0, then the applicant must be
from a Rural area.
We will stratify on our response variable Loan_Status
and use 10 folds to perform stratified cross validation. K-fold
cross-validation divides our data into k folds of roughly equal sizes,
holds out the first fold as a validation set, and fits the model on the
remaining k-1 folds as if they were the training set. This is repeated k
times; each time, a different fold is used as a validation set. This
results in k estimates of the test MSE (or in the classification case,
test error rate).
loan_folds <- vfold_cv(loan_train, v = 10, strata = Loan_Status)To save computational time, we will save the results to an RDA file; once we have the model we want, we can go and load it later with no time commitment.
save(loan_ds, loan_folds, loan_recipe, loan_train, loan_test,
file = "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/loan-setup.rda")It’s time to build our models! For ease of efficiency and access, I will be building each model in a separate R file and saving my results in RDA files. The models will then be loaded below for further exploration. This allows us to streamline our analysis and save on computational time.
For each model, we will:
For models requiring parameter tuning, we’ll complete steps 3-5.
grid_regular to set up tuning grids of values for
the parameters we’re tuning and specify levels for each.tune_grid().roc_auc and finalize the workflow.Afterwards, we’ll load back in the saved files, collect error metrics, and analyze their individual performances.
The performance metric we’ll be using is roc_auc, which
stands for area under the ROC curve. The ROC (receiver operating
characteristics) curve is a popular graphic that plots true positive
rate (TPR) vs. false positive rate (FPR) at various threshold settings.
TPR is sensitivity (proportion of observations that are
correctly classified), while FPR is 1-specificity (proportion
of observations that are incorrectly classified); the higher the TPR,
the better. The AUC (area under curve) is a measure of the diagnostic
ability of a classifier, highlighting the trade-off between sensitivity
and specificity.
It’s time to load our models back in to evaluate their results!
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/logistic.rda")
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/knn.rda")
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/en.rda")
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/lda.rda")
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/qda.rda")
load(file= "~/Desktop/School/PSTAT/PSTAT 131/proj-final/rda_files/decision-tree.rda")Here, we will visualize the results of our tuned models. We will use
the autoplot function to visualize the effect of varying
select parameters on the performance of each model according to its
impact on our metric of choice.
For the KNN model, we had 10 different levels of
neighbors. In general, the higher the number of
neighbors, the greater the roc_auc. The
roc_auc score of the best performing model (k=10) is
approximately 0.71, which is pretty decent.
autoplot(knn_tune_res)In our elastic net model, we tuned 2 parameters with 10 levels of
each: penalty, the amount of regularization, and
mixture, the proportion of lasso penalty (1 for pure lasso,
0 for pure ridge). We can see from the graph that the optimal mixture
was 0 (pure ridge). Lower levels of mixture resulted in higher
roc_auc scores, and that models performed worse as
penalty (amount of regularization) increased.
autoplot(en_tune_res)
In our elastic net model, we tuned 2 parameters with 10 levels of each:
penalty, the amount of regularization, and
mixture, the proportion of lasso penalty (amount of
regularization).
For our decision tree model, we focused on the parameter
cost_complexity and tuned it with 10 levels. Oftentimes
decision trees can have too many splits, leading to a very complex model
that is likely to overfit the data. A smaller tree with fewer splits can
address this issue by yielding a simpler model (better interpretation,
more bias).
The idea of cost-complexity pruning is similar to that of lasso /
ridge regularization: first, we grow a very large tree, then consider a
sequenced of pruned sub-trees and select the one that minimizes a
penalized error metric. The tuning parameter
cost_complexity controls a trade-off between a subtree’s
complexity and its fit to the training data; when
cost_complexity is 0, it’s the same as the the training
error rate; as cost_complexity increases, the tree is
penalized for having too many nodes.
We can see from the plot below that a cost-complexity of about 0.25
yields the optimal model (with highest roc_auc). This
indicates that pruning was a correct choice. Note that the parameter
uses the log10_trans() functions by default, so all of the
values in our grid are in the log10 scale.
autoplot(dt_tune_res)Here, we will compare the performance of each model on the training
data and create visualization. I’ve created a tibble in order to display
the estimated testing roc_auc scores for each fitted
model.
log_auc <- augment(log_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
lda_auc <- augment(lda_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
qda_auc <- augment(qda_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
knn_auc <- augment(knn_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
en_auc <- augment(en_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
dt_auc <- augment(dt_final_fit, new_data = loan_train) %>%
roc_auc(truth = Loan_Status, .pred_Yes) %>%
select(.estimate)
roc_aucs <- c(log_auc$.estimate,
lda_auc$.estimate,
qda_auc$.estimate,
knn_auc$.estimate,
en_auc$.estimate,
dt_auc$.estimate)
mod_names <- c("Logistic Regression",
"LDA",
"QDA",
"KNN",
"Elastic Net",
"Decision Tree")mod_results <- tibble(Model = mod_names,
ROC_AUC = roc_aucs)
mod_results <- mod_results %>%
dplyr::arrange(-roc_aucs)
mod_resultsWhile all of our models performed well, the best-performing model is
the KNN model with an roc_auc score of 0.94, with the QDA
model close behind at 0.86. I’ve created a lollipop plot below to help
visualize these results.
lp_plot <- ggplot(mod_results, aes(x = Model, y = ROC_AUC)) +
geom_segment( aes(x = Model, xend = 0, y = ROC_AUC, yend = 0)) +
geom_point( size=7, color= "black", fill = alpha("blue", 0.3), alpha=0.7, shape=21, stroke=3) +
labs(title = "Model Results") +
theme_minimal()
lp_plotNow that we’ve identified our best models, we can continue to further analyze their true performance. We will start with the KNN model and also analyze the performance of the decision tree and QDA model as a means of comparison.
So, the KNN model performed the best overall, but which of value of
neighbors yields the best performance?
# select metrics of best knn model
knn_tune_res %>%
collect_metrics() %>%
dplyr::arrange(mean) %>%
slice(10)KNN model # 10 with 11 predictors, 10 neighbors, and a mean
roc_auc score of 69 performed the best! Now that we have
our best model, we can fit it to our testing data to explore its true
predictive power.
Despite performing well on the training set, the KNN model performed poorly on our test data. In general, an AUC value between 0.7-0.8 is considered acceptable; the KNN models falls 0.2 points short of the lower boundary.
knn10_roc_auc <- augment(knn_final_fit, new_data = loan_test) %>%
roc_auc(Loan_Status, .pred_Yes) %>%
select(.estimate)
knn10_roc_auc Below is a confusion matrix of the test results as well as an ROC/AUC plot.
knn_test_results <- augment(knn_final_fit, new_data = loan_test)
knn_test_results %>%
conf_mat(truth = Loan_Status, estimate = .pred_class) %>%
autoplot(type = "heatmap")knn_test_results %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()In general, the more an ROC curve resembles the top left angle of a square, the better the AUC. While our curve is not perfect, it has the correct shape and looks pretty decent.
Here’s a distribution of the predicted probabilities.
knn_test_results %>%
ggplot(aes(x = .pred_Yes, fill = Loan_Status)) +
geom_histogram(position = "dodge") + theme_bw() +
xlab("Probability of Yes") +
scale_fill_manual(values = c("blue", "orange"))Now, it’s time to analyze our quadratic discriminant analysis (QDA) classifier. In short, it’s a more advanced version of a LDA model used to find a non-linear decision boundaries between classifiers, assuming each class follows a Gaussian distribution.
To my surprise, the QDA model performed better than the KNN model,
though its computed roc_auc score is only slightly higher.
Nevertheless, a 0.02 point increase is very significant when it comes to
AUC.
qda_roc_auc <- augment(qda_fit, new_data = loan_test, type = 'prob') %>%
roc_auc(Loan_Status, .pred_Yes) %>%
select(.estimate)
qda_roc_aucInstead of fluctuating between concavity and convexity (in the case of KNN), the QDA model’s ROC curve is consistently concave; definitely an improvement.
augment(qda_fit, new_data = loan_test, type = 'prob') %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()Lastly, we explore the results of the elastic net model on our test
data. First, let’s compute its roc_auc score and then
create visualization as needed.
The elastic net model performed the best out of our top 3 models,
with an roc_auc score of 0.75.
en_roc_auc <- augment(en_final_fit, new_data = loan_test, type = 'prob') %>%
roc_auc(Loan_Status, .pred_Yes) %>%
select(.estimate)
en_roc_aucIts ROC curve looks much better than that of the KNN model, and is an improvement from the QDA model as well. From approximately 0.5 specificity onward, sensitivity sits near 1.0. This is a good sign!
augment(en_final_fit, new_data = loan_test, type = 'prob') %>%
roc_curve(Loan_Status, .pred_Yes) %>%
autoplot()In this project, we tackled the problem of loan prediction given select demographics specified in applicant profiles. We worked with a relatively small dataset with a large number of features. We tidied the data, performed exploratory analysis, and fit a number of models of varying complexity and flexibility. Through analysis, testing, and assessment, we found the elastic net model to be most optimal for predicting the loan status of an applicant. However, the model was not perfect and leaves room for improvement.
In fact, none of our models performed particularly well for this problem. This can be due to a variety of factors, such as a violation of assumptions and overfitting. None of the models considered were particularly robust in preventing overfitting (with the exception of elastic net).
Both the logistic and LDA model assumes a linear decision boundary, are are prone to error in higher dimensions. The Elastic Net (Ridge) model was efficient for variance reduction, but risk increased bias; in our problem, it seemed to have reduced variance to a greater margin, thus decreasing overall error. The QDA model, while an improvement from LDA and Logistic, is not as flexible as KNN or decision tree. For more complex decision boundaries, a non-parametric approach may be preferred. The decision tree is highly variant and also tends to overfit. KNN doesn’t require linear separability and makes no distributional assumptions; however, it does not model relationships very well and is also prone to overfitting.
Given that the QDA and elastic net (Ridge) models performed relatively well on the test set, we can infer that the relationships in our data are non-linear. A potential improvement would be to consider alternative non-linear models or non-linear extensions to some of our models. Another option would be to consider non-parametric approaches.
As far as our error metric (as measured by roc_auc), the
elastic net model outperformed the QDA model on the test set, while
under-performing on the training set. It is important to note that both
models did not have particularly high predictive accuracy (both around
0.7-0.8), likely due to the fact that neither are optimal for
dimensional reduction. A more flexible approach, such as a random
forest, may be better suitable for our data. Its removal of redundant
features and noise would lead to less misleading data and subsequently
an improvement in model accuracy.
It’s also good to acknowledge that none of our models performed particularly poorly either. Loan prediction is no easy feat, and predictive models are undoubtedly prone to nuisance factors and noise. In addition, our dataset was incomplete; inclusion of factors such as an applicant’s age or race would provide a clearer picture of the company’s target demographic and possibly reveal implicit biases in lending. With this understanding, assigning a class label to each applicant based on a select few demographics seem unfair. Instead, applicants should be assessed on a case-by-case basis.